AITopics | local policy

Collaborating Authors

local policy

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Guided Policy Search via Approximate Mirror Descent

Neural Information Processing SystemsMay-1-2026, 06:07:20 GMT

Guided policy search algorithms can be used to optimize complex nonlinear policies, such as deep neural networks, without directly computing policy gradients in the high-dimensional parameter space. Instead, these methods use supervised learning to train the policy to mimic a "teacher" algorithm, such as a trajectory optimizer or a trajectory-centric reinforcement learning method. Guided policy search methods provide asymptotic local convergence guarantees by construction, but it is not clear how much the policy improves within a small, finite number of iterations. We show that guided policy search algorithms can be interpreted as an approximate variant of mirror descent, where the projection onto the constraint manifold is not exact. We derive a new guided policy search algorithm that is simpler and provides appealing improvement and convergence guarantees in simplified convex and linear settings, and show that in the more general nonlinear setting, the error in the projection step can be bounded. We provide empirical results on several simulated robotic navigation and manipulation tasks that show that our method is stable and achieves similar or better performance when compared to prior guided policy search methods, with a simpler formulation and fewer hyperparameters.

local policy, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

FOVA: Offline Federated Reinforcement Learning with Mixed-Quality Data

Qiao, Nan, Yue, Sheng, Ren, Ju, Zhang, Yaoxue

arXiv.org Artificial IntelligenceDec-3-2025

Offline Federated Reinforcement Learning (FRL), a marriage of federated learning and offline reinforcement learning, has attracted increasing interest recently. Albeit with some advancement, we find that the performance of most existing offline FRL methods drops dramatically when provided with mixed-quality data, that is, the logging behaviors (offline data) are collected by policies with varying qualities across clients. To overcome this limitation, this paper introduces a new vote-based offline FRL framework, named FOVA. It exploits a \emph{vote mechanism} to identify high-return actions during local policy evaluation, alleviating the negative effect of low-quality behaviors from diverse local learning policies. Besides, building on advantage-weighted regression (AWR), we construct consistent local and global training objectives, significantly enhancing the efficiency and stability of FOVA. Further, we conduct an extensive theoretical analysis and rigorously show that the policy learned by FOVA enjoys strict policy improvement over the behavioral policy. Extensive experiments corroborate the significant performance gains of our proposed algorithm over existing baselines on widely used benchmarks.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2512.0235

Country: Asia > China (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (0.67)
Education (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Offline Multi-Agent Reinforcement Learning with Implicit Global-to-Local Value Regularization Xiangsen Wang

Neural Information Processing SystemsOct-9-2025, 03:35:21 GMT

Despite some success in the single-agent setting, offline multi-agent RL (MARL) remains to be a challenge. The large joint state-action space and the coupled multi-agent behaviors pose extra complexities for offline policy optimization.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country:

Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Hierarchical Decentralized Stochastic Control for Cyber-Physical Systems

Kaza, Kesav, Anantharaman, Ramachandran, Meshram, Rahul

arXiv.org Artificial IntelligenceAug-28-2025

This paper introduces a two-timescale hierarchical decentralized control architecture for Cyber-Physical Systems (CPS). The system consists of a global controller (GC), and N local controllers (LCs). The GC operates at a slower timescale, imposing budget constraints on the actions of LCs, which function at a faster timescale. Applications can be found in energy grid planning, wildfire management, and other decentralized resource allocation problems. We propose and analyze two optimization frameworks for this setting: COpt and FOpt. In COpt, both GC and LCs together optimize infinite-horizon discounted rewards, while in FOpt the LCs optimize finite-horizon episodic rewards, and the GC optimizes infinite-horizon rewards. Although both frameworks share identical reward functions, their differing horizons can lead to different optimal policies. In particular, FOpt grants greater autonomy to LCs by allowing their policies to be determined only by local objectives, unlike COpt. To our knowledge, these frameworks have not been studied in the literature. We establish the formulations, prove the existence of optimal policies, and prove the convergence of their value iteration algorithms. We further show that COpt always achieves a higher value function than FOpt and derive explicit bounds on their difference. Finally, we establish a set of sufficient structural conditions under which the two frameworks become equivalent.

artificial intelligence, machine learning, optimization problem, (18 more...)

arXiv.org Artificial Intelligence

2506.22971

Country: North America (0.28)

Genre: Research Report (0.40)

Industry: Energy (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.47)

Add feedback

Local Policies Enable Zero-shot Long-horizon Manipulation

Dalal, Murtaza, Liu, Min, Talbott, Walter, Chen, Chen, Pathak, Deepak, Zhang, Jian, Salakhutdinov, Ruslan

arXiv.org Artificial IntelligenceOct-29-2024

Sim2real for robotic manipulation is difficult due to the challenges of simulating complex contacts and generating realistic task distributions. To tackle the latter problem, we introduce ManipGen, which leverages a new class of policies for sim2real transfer: local policies. Locality enables a variety of appealing properties including invariances to absolute robot and object pose, skill ordering, and global scene configuration. We combine these policies with foundation models for vision, language and motion planning and demonstrate SOTA zero-shot performance of our method to Robosuite benchmark tasks in simulation (97%). We transfer our local policies from simulation to reality and observe they can solve unseen long-horizon manipulation tasks with up to 8 stages with significant pose, object and scene configuration variation. ManipGen outperforms SOTA approaches such as SayCan, OpenVLA, LLMTrajGen and VoxPoser across 50 real-world manipulation tasks by 36%, 76%, 62% and 60% respectively. Video results at https://mihdalal.github.io/manipgen/

large language model, local policy, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2410.22332

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Industry: Energy > Oil & Gas (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

ComaDICE: Offline Cooperative Multi-Agent Reinforcement Learning with Stationary Distribution Shift Regularization

Bui, The Viet, Nguyen, Thanh Hong, Mai, Tien

arXiv.org Artificial IntelligenceOct-2-2024

Offline reinforcement learning (RL) has garnered significant attention for its ability to learn effective policies from pre-collected datasets without the need for further environmental interactions. While promising results have been demonstrated in single-agent settings, offline multi-agent reinforcement learning (MARL) presents additional challenges due to the large joint state-action space and the complexity of multi-agent behaviors. A key issue in offline RL is the distributional shift, which arises when the target policy being optimized deviates from the behavior policy that generated the data. This problem is exacerbated in MARL due to the interdependence between agents' local policies and the expansive joint state-action space. Prior approaches have primarily addressed this challenge by incorporating regularization in the space of either Q-functions or policies. In this work, we introduce a regularizer in the space of stationary distributions to better handle distributional shift. Our algorithm, ComaDICE, offers a principled framework for offline cooperative MARL by incorporating stationary distribution regularization for the global learning policy, complemented by a carefully structured multi-agent value decomposition strategy to facilitate multi-agent training. Through extensive experiments on the multi-agent MuJoCo and StarCraft II benchmarks, we demonstrate that ComaDICE achieves superior performance compared to state-of-the-art offline MARL methods across nearly all tasks. Over the years, deep RL has achieved remarkable success in various decision-making tasks (Levine et al., 2016; Silver et al., 2017; Kalashnikov et al., 2018; Haydari & Yılmaz, 2020). However, a significant limitation of deep RL is its need for millions of interactions with the environment to gather experiences for policy improvement.

algorithm, comadice and baseline, reinforcement learning, (9 more...)

arXiv.org Artificial Intelligence

2410.01954

Country:

North America > United States > Oregon > Lane County > Eugene (0.14)
Asia > Singapore (0.04)
North America > United States > Ohio > Lucas County > Oregon (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Leisure & Entertainment > Games > Computer Games (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.82)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.45)

Add feedback

Soft-QMIX: Integrating Maximum Entropy For Monotonic Value Function Factorization

Chen, Wentse, Huang, Shiyu, Schneider, Jeff

arXiv.org Artificial IntelligenceJun-19-2024

Multi-agent reinforcement learning (MARL) tasks often utilize a centralized training with decentralized execution (CTDE) framework. QMIX is a successful CTDE method that learns a credit assignment function to derive local value functions from a global value function, defining a deterministic local policy. However, QMIX is hindered by its poor exploration strategy. While maximum entropy reinforcement learning (RL) promotes better exploration through stochastic policies, QMIX's process of credit assignment conflicts with the maximum entropy objective and the decentralized execution requirement, making it unsuitable for maximum entropy RL. In this paper, we propose an enhancement to QMIX by incorporating an additional local Q-value learning method within the maximum entropy RL framework. Our approach constrains the local Q-value estimates to maintain the correct ordering of all actions. Due to the monotonicity of the QMIX value function, these updates ensure that locally optimal actions align with globally optimal actions. We theoretically prove the monotonic improvement and convergence of our method to an optimal solution. Experimentally, we validate our algorithm in matrix games, Multi-Agent Particle Environment and demonstrate state-of-the-art performance in SMAC-v2.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2406.1393

Genre: Research Report (1.00)

Industry: Energy > Oil & Gas (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Algorithms for learning value-aligned policies considering admissibility relaxation

Holgado-Sánchez, Andrés, Arias, Joaquín, Billhardt, Holger, Ossowski, Sascha

arXiv.org Artificial IntelligenceJun-7-2024

The emerging field of \emph{value awareness engineering} claims that software agents and systems should be value-aware, i.e. they must make decisions in accordance with human values. In this context, such agents must be capable of explicitly reasoning as to how far different courses of action are aligned with these values. For this purpose, values are often modelled as preferences over states or actions, which are then aggregated to determine the sequences of actions that are maximally aligned with a certain value. Recently, additional value admissibility constraints at this level have been considered as well. However, often relaxed versions of these constraints are needed, and this increases considerably the complexity of computing value-aligned policies. To obtain efficient algorithms that make value-aligned decisions considering admissibility relaxation, we propose the use of learning techniques, in particular, we have used constrained reinforcement learning algorithms. In this paper, we present two algorithms, $\epsilon\text{-}ADQL$ for strategies based on local alignment and its extension $\epsilon\text{-}CADQL$ for a sequence of decisions. We have validated their efficiency in a water distribution problem in a drought scenario.

algorithm, alignment, inhabitant, (16 more...)

arXiv.org Artificial Intelligence

2406.04838

Country:

Europe > Spain > Galicia > Madrid (0.04)
Europe > Switzerland (0.04)

Genre: Research Report (0.64)

Industry:

Education (0.46)
Law (0.46)
Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Federated Offline Policy Optimization with Dual Regularization

Yue, Sheng, Qin, Zerui, Hua, Xingyuan, Deng, Yongheng, Ren, Ju

arXiv.org Artificial IntelligenceMay-28-2024

Federated Reinforcement Learning (FRL) has been deemed as a promising solution for intelligent decision-making in the era of Artificial Internet of Things. However, existing FRL approaches often entail repeated interactions with the environment during local updating, which can be prohibitively expensive or even infeasible in many real-world domains. To overcome this challenge, this paper proposes a novel offline federated policy optimization algorithm, named $\texttt{DRPO}$, which enables distributed agents to collaboratively learn a decision policy only from private and static data without further environmental interactions. $\texttt{DRPO}$ leverages dual regularization, incorporating both the local behavioral policy and the global aggregated policy, to judiciously cope with the intrinsic two-tier distributional shifts in offline FRL. Theoretical analysis characterizes the impact of the dual regularization on performance, demonstrating that by achieving the right balance thereof, $\texttt{DRPO}$ can effectively counteract distributional shifts and ensure strict policy improvement in each federative learning round. Extensive experiments validate the significant performance gains of $\texttt{DRPO}$ over baseline methods.

agent, algorithm, drpo, (13 more...)

arXiv.org Artificial Intelligence

2405.17474

Country: Asia > China > Beijing > Beijing (0.05)

Genre: Research Report (1.00)

Industry: Information Technology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.70)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.69)

Add feedback

Filters

Collaborating Authors

local policy

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Guided Policy Search via Approximate Mirror Descent

a46c84276e3a4249ab7dbf3e069baf7f-Paper-Conference.pdf

FOVA: Offline Federated Reinforcement Learning with Mixed-Quality Data

Offline Multi-Agent Reinforcement Learning with Implicit Global-to-Local Value Regularization Xiangsen Wang

Hierarchical Decentralized Stochastic Control for Cyber-Physical Systems

Local Policies Enable Zero-shot Long-horizon Manipulation

ComaDICE: Offline Cooperative Multi-Agent Reinforcement Learning with Stationary Distribution Shift Regularization

Soft-QMIX: Integrating Maximum Entropy For Monotonic Value Function Factorization

Algorithms for learning value-aligned policies considering admissibility relaxation

Federated Offline Policy Optimization with Dual Regularization